This notebook is the third in the set of steps to run machine learning on the cloud. In this step, we will use the model training in the previous notebook and continue with evaluating the resulting model.
Evaluation is accomplished by first running batch prediction over one or more evaluation datasets. This is done via Cloud Dataflow, and then analyzing the results using BigQuery. Doing evaluation using these services allow you to scale to large evaluation datasets.
In [1]:
import google.datalab as datalab
import google.datalab.ml as ml
import mltoolbox.regression.dnn as regression
import os
The storage bucket was created earlier. We'll re-declare it here, so we can use it.
In [5]:
storage_bucket = 'gs://' + datalab.Context.default().project_id + '-datalab-workspace/'
storage_region = 'us-central1'
workspace_path = os.path.join(storage_bucket, 'census')
training_path = os.path.join(workspace_path, 'training')
In [6]:
!gsutil ls -r {training_path}/model
We'll submit a batch prediction Dataflow job, to use this model by loading it into TensorFlow, and running it in evaluation
mode (the mode that expects the input data to contain a value for target). The other mode, prediction
, is used to predict over data where the target column is missing.
NOTE: Batch prediction can take a few minutes to launch while compute resources are provisioned. In the case of large datasets in real-world problems, this overhead is a much smaller part of the overall job lifetime.
In [7]:
eval_data_path = os.path.join(workspace_path, 'data/eval.csv')
evaluation_path = os.path.join(workspace_path, 'evaluation')
regression.batch_predict(training_dir=training_path, prediction_input_file=eval_data_path,
output_dir=evaluation_path,
mode='evaluation',
output_format='csv',
cloud=True)
Once prediction is done, the individual predictions will be written out into Cloud Storage.
In [8]:
!gsutil ls {evaluation_path}
In [9]:
!gsutil cat {evaluation_path}/csv_schema.json
!gsutil -q -m cp -r {evaluation_path}/ /tmp
!head -n 5 /tmp/evaluation/predictions-00000*
In [10]:
%bq datasource --name eval_results --paths gs://cloud-ml-users-datalab-workspace/census/evaluation/predictions*
{
"schema": [
{
"type": "STRING",
"mode": "nullable",
"name": "SERIALNO"
},
{
"type": "FLOAT",
"mode": "nullable",
"name": "predicted_target"
},
{
"type": "FLOAT",
"mode": "nullable",
"name": "target_from_input"
}
]
}
In [11]:
%bq query --datasource eval_results
SELECT SQRT(AVG(error)) AS rmse FROM (
SELECT POW(target_from_input - predicted_target, 2) AS error
FROM eval_results
)
Out[11]:
In [12]:
%bq query --name distribution_query --datasource eval_results
WITH errors AS (
SELECT predicted_target - target_from_input AS error
FROM eval_results
),
error_stats AS (
SELECT MIN(error) AS min_error, MAX(error) - MIN(error) AS error_range FROM errors
),
quantized_errors AS (
SELECT error, FLOOR((error - min_error) * 20 / error_range) error_bin FROM errors CROSS JOIN error_stats
)
SELECT AVG(error) AS error, COUNT(error_bin) AS instances
FROM quantized_errors
GROUP BY error_bin
ORDER BY error_bin
In [13]:
%chart columns --data distribution_query --fields error,instances
Out[13]:
As shown with the queries and chart, SQL and BigQuery can be used to analyze large evaluation results to understand how the model performs. In this case, the error as well as distribution of errors.
We're almost done. Now that we have a model, our next step will be to deploy it and use it for predictions. This will be covered in the next notebook.